Resilient Work Stealing
نویسندگان
چکیده
Future generations of processors will exhibit an increase of faults over their lifetime, and it becomes increasingly expensive to solve the resulting reliability issues purely at the hardware level. We propose to model computations in terms of restartable task graphs in order to improve reliability at the software level. As a proof of concept, we present Cobra, a novel design for a shared-memory work-stealing scheduler that realizes this notion of restartable task graphs, and enables computations to survive hardware failures due to soft errors. A comparison with the work-stealing scheduler of Threading Building Blocks on the PARSEC benchmark suite shows that Cobra incurs no performance overhead in the absence of failures, and low performance overheads in the presence of single and multiple failures.
منابع مشابه
Paving the Road to Exascale with Many-Task Computing
Exascale systems will bring significant challenges. This work attempts to addresses them through the Many-Task Computing (MTC) paradigm, by delivering data-aware job scheduling systems and fully asynchronous distributed architectures. MTC applications are structured as DAG graphs of tasks, with dependencies forming the edges. The asynchronous nature of MTC makes it more resilient than tradition...
متن کاملFault Tolerant Scheduling for Parallel Loops on Shared Memory Systems
While multicore/multiprocessor systems achieve significant speedup for many applications by exploiting loop level parallelism, they also suffer from increased reliability problems as a result of ever scaling device size. This paper addresses the reliability of loop dominated applications, aiming to execute parallel loops efficiently in the presence of various types of hardware faults. In this p...
متن کاملEfficient Work Stealing for Portability of Nested Parallelism and Composability of Multithreaded Program
We present performance evaluations of parallel-for loop with work stealing technique. The parallel-for by work stealing transforms the parallel-loop into a form of binary tree by making use of method of divide-and-conquer. Iterations are distributed in the leaves procedures of the binary tree, and the parallel executions are performed by stealing subtrees from the bottom of the tree. The work s...
متن کاملFaster Work Stealing With Return Barriers
Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Our prior work demonstrates...
متن کاملGreen Thieves in Work Stealing
This paper proposes an energy-efficient approach for programming languages that support work stealing. The key insight is that thieves and victims in the work stealing algorithm can coordinate their execution paces for more energy efficiency, through dynamic adjustment of CPU frequencies.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.03539 شماره
صفحات -
تاریخ انتشار 2017